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In the eukaryotic cell, both secreted and plasma membrane 
proteins are synthesized at the endoplasmic reticulum, then 
transported, via the Golgi complex, to the cell surface*”*. Each 
of the compartments of this s transport pathway carries out par- 
ticular metabolic functions *, and therefore presumably con- 
tains a distinct complement of membrane proteins. Thus, 
mechanisms must exist for localizing such proteins to their 
respective destinations. However, a major obstacle to the study 
of such mechanisms is that the isolation and detailed analysis 
of such internal membrane proteins pose formidable technical 
problems. We have therefore used the El glycoprotein from 
coronavirus MHV-AS59 as a viral model for this class of protein. 
Here we present the primary structure of the protein, deter- 
mined by analysis of cDNA clones prepared from viral mRNA. 
In combination with a previous stud ly of its assembly into the 
endoplasmic reticulum membrane’, the sequence reveals 
several unusual features of the protein which may be related 
to its intracellular localization. 

The coronaviruses are a diverse class of enveloped RNA 
viruses of considerable medical and agricultural significance; 
they also provide a model for the study of persistent viral 
infections (see ref. 10 for review). In contrast to many enveloped 
viruses, the coronavirus mouse hepatitis virus (MHV) A59 buds 
inside the cell, into the lumen of the endoplasmic reticulum'*""*. 
The assembled virion then appears to travel, via the Golgi 
complex, to the cell surface. Of the two viral membrane proteins, 
the smaller one, E1, is necessary for formation of the envelope, 
and is restricted to internal cell membranes; apparently it only 
reaches the cell surface as part of the budded virion . Thus, 
the E1 glycoprotein is potentially a convenient model for study- 
ing those features of a membrane protein that determine its 
arrest at a particular destination on the membrane transport 
pathway. 

The mRNAs of MHV-AS59 form a ‘nested set’: the seven 
RNAs share the 3’ region of the positive-stranded genome, but 
extend to different lengths towards the 5’ end!*!®. From each 
RNA, only the 5’ gene is translated'®”°. In addition, a non- 
coding ‘leader’ sequence of approximately 70 bases, from the 
5’ end of the genome, is common to the mRNAs?® 17. The 
E1 gene is second from the 3’ end and is therefore translated 
from the second smallest mRNA, RNA 6 (refs 19, 20). The 
sequence of the 3’-terminal gene, encoding the viral nucleocap- 
sid protein, has been determined previously***. 

Copy DNA clones spanning the El gene were prepared by 
two methods”*"** and sequenced in the vectors M13mp8 (ref, 
26) or pEMBLS (ref. 27) by the chain-termination method”® 
(data available on request). A sequence of 780 nucleotides 
(Fig. 1), containing a single long open reading frame, precedes 
the coding region for the viral nucleocapsid protein. A leader 
of 76 nucleotides, almost identical to the leader of the smallest 
mRNA, RNA 7 (ref. 22), lies in front of the first potential 
initiator codon. Thus, the sequence in Fig. 1 represents the 5’ 
end of RNA 6, encoding the E1 protein and starting at or near 
the extreme 5’-terminal nucleotide. 
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LETTERS TONATURE e 


CCTATAAGAGTGATTGGCGTCCGTACGTACCCTCTCAACTCTAAAACTCTIGTAGTTTAA 
38 58 


MetSerSerThr ThrGlnaAlaProGluProValTyrGlntTrpTh 
ATCTAATCCAAACATTATGAGTAGTACTAC TCAGGCCCCAGAGCCCGTCTATCAATGGAC 
30 118 


cAlaAspGluaAlavalGinPheLeuLysGluTr pAsnPheSerLeuGlyilerleLeuLle 
GGCCGACGAGGCAGTTCAATTCCTTAAGGAATGGAACTTCTCGTTGGGCATTATACTACT 
138 15@ 178 


uPhelleThr llelleLeuGinpheGlyTyr Thr serArgSermetPhelleTyrValva 
CTTTATTACTATCATACTACAGTTCGGTTACACGAGCCGTAGCATGTTTATTTATGTTGT 
198 2198 238 


lLysMetIleIleLeuTrpLeuMet Tr pproLeuThr Ileval LeuCysilepheaAsncy 
GAAAATGATAATCTTGTGGTTAATGTGGCCACTGACTATIGTTTTGTGTATTTTCAATTG 
258 278 299 


SVal TyrAlaLeuAsnAsnVal Tyr LeuGl yPheSer 1leValPheThrileValSer1l 
CGTGTATGCGCTAAATAATGTGTATC TTGGATTTTCTATAGTGTTTACTATAGTGTCCAT 
3198 334 358 


eVal LleTrplIleMet Tyr PheVal Asnsér lleArgLeuphelleArgThrGlySerTr 
TGTAATC TGGATTATGTATTTTGTTAATAGCATAAGGTTGTTTATCAGGACTGGTAGCTG 
378 398 418 


pTr pSer PheAsnProGluThr AsnAsnLeuMetCys lleAspMetLysGl yThrVal Ty 
GTGGAGC TTCAACCCCGAAACAAACAACCTTATGTGTATAGATA TGAAAGGTACCGTGTA 
436 45a 478 


cValArgProllelleGluAspTyrHisThrLeuThrAlaThrllelleArgGlyHisLe 
TGTTAGACCCATTATTGAGGATTACCATACAC TAACAGCCACTATTATTCGTGGCCACCT 
498 51a 536 


uTyrMetGinGlyval LysLeuGlyThrGlyPheSer LeuSerAspLeuProAlaTyrvVa 
CTACATGCAAGGTGTTAAGCTAGGCACCGGTTTCTCTTTGTCTGACTTGCCCGCTTATGT 
558 578 598 


lphrvalAlaLysVal SerHisLeuCysThrTyr LysArgAl aPheLeuAspLysValAs 
TACAGTTGCTAAGGTGTCACACCTTTGCACTTA TAAGCGCGCATTCTTAGACAAGGTAGA 
610 638 658 


pGlyVal SerGiyPheAl aval Tyr Val LysSer LysVaiGl yAsnTyrArgLeuprose 
CGGTGTTAGCGGTTTTGCTGTTTATGTGAAGTCCAAGGTCGGAAATTACCGAC TGCCCTC 
678 698 716 


rAsnLysProSerGlyAlaAspThraAlaLeuLeuArgIle* 
AAACAAACCGAGTGGCGCGGACACCGCATTGTTGAGAATCTAATC TAAACTTTAAGGATG 
7390 758 


Fig. 1 Sequence of the El cDNA and protein extending to the 
initiator codon of the adjacent nucleocapsid gene?®. Proposed 
membrane-spanning regions are overlined. 


Two versions were found, in two different clones, for the 
sequence immediately upstream from the El initiator codon. 
The shorter one is shown in Fig. 1; in the second clone, an 
additional copy of the pentanucleotide ATCTA was found 
between nucleotides 65 and 66, making the sequence similar to 
that of the region adjacent to the nucleocapsid gene of another 
strain of MHV2". This difference could represent a mutation; 
alternatively, it may reflect heterogeneity in the normal MRNA 
population. Indirect support for the latter possibility comes from 
the observation that a RNase-T, oligonucleotide from this 
region of RNA 6, corresponding to the shorter sequence, was 
recovered in markedly lower yield than those from the rest of 
the molecule*’. This site represents the point of fusion between 
the 5’ leader sequence and the coding portion of the RNA. The 
fusion is thought to occur by ‘jumping’ of the viral RNA poly- 
merase to particular sites on its genome-length, negative- 
stranded template; the resumption of transcription then pro- 
duces each of the subgenomic mRNAs?*"!*?. Thus, it seems 
possible that the polymerase may jump to more than one point 
on the template for each mRNA, generating variable numbers 
of the repeated pentanucleotide AUCUA in the resulting tran- 
script. 

Figure 1 shows the amino acid sequence encoded by the E1 
gene. The predicted molecular weight of the protein is 26,000, 
slightly higher than that observed by gel electrophoresis’*** but 
consistent with the unusual electrophoretic behaviour of this**, 
and other, hydrophobic proteins. Several features of the protein, 
when assembled into membranes in the virus®’, or in vitro”, are 
reflected in the sequence. First, in contrast to the majority of 
membrane proteins, El is known to lack a cleaved ‘signal 
peptide”: the N-terminal region of the sequence contains no 
good candidate for a cleavage site**. Second, the N-terminal 
region bears O-linked sugars****, which, uniquely among viral 
proteins so far studied, are the only known post-translational 
modification to El. Assuming that the terminal Met is 
removed?’, the N-terminal sequence is Ser-Ser-Thr-Thr, which 
is identical to the O-glycosylated amino terminus of M-type 
glycophorin A (ref. 38). The O-linked sugars of El are them- 
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selves identical to those found in glycophorin*®. Third, most of 
the protein is resistant to proteolysis when assembled in the 
membrane. Only 2.5 kilodaltons of polypeptide from the N- 
terminus are cleavable on the luminal side of the membrane (or 
outside the virion) and 1.5 kilodaltons from the C-terminus 
from the cytoplasmic (or intra-virion) side”, suggesting that the 
protein is largely buried in the membrane. In the sequence, a 
run of 22 uncharged residues from positions 26 to 47 represents 
a potential membrane-spanning region; residues 1-25 corres- 
pond to the portion removable by protease. A further sequence 
of uncharged residues, positions 57-106, is sufficiently long to 
cross the membrane twice more. If this region is divided in two, 
and each half plotted as an a-helical ‘wheel’, all the polar side 
chains of both sections cluster within 140°. Thus, a plausible 
conformation for this region is two hairpinned helices in the 
membrane, with adjacent polar faces (Fig. 2a). There are no 
other long hydrophobic sequences, implying that the region 
from residues 107 to ~190 is either folded in the membrane to 
neutralize charges, or, more likely, is adjacent to the membrane 
but resistant to proteolysis. These features are summarized in 
Fig. 25. 

Which, if any, of these various features might be responsible 
for the protein’s intracellular localization? We do not know, for 
example, whether the protein has an active ‘signal’ causing its 
arrest on the transport pathway, or, alternatively, if it lacks a 
signal for onward transport; nor do we know whether a sorting 
process might operate on one or the other side of the membrane. 
The availability of a cDNA clone for the protein presents the 
opportunity to investigate these questions by allowing 
expression of the cloned DNA and in vitro mutagenesis. 

This approach has already been applied to two other viral 
glycoproteins, to investigate the importance of their cytoplasmic 
domains for transport to the cell surface, yielding opposite 
conclusions***!, An intrinsic problem with the method, 
however, is the difficulty of distinguishing specific effects due to 
alterations at the site of mutagenesis, from a general structural 
disruption of the molecule. In this respect the E1 protein may 
be advantageous in that it provides the possibility of creating a 
more ‘active’ phenotype in the mutated molecule: specifically, 
particular alterations to the protein may result in its transport 
to the cell surface. 

We thank Willy Spaan for communicating results before publi- 
cation, G. Heisterberg-Moutsis (G. B. F. Braunschweig) for 
help with oligonucleotide synthesis, Ben van der Zeijst for 
discussion and Annie Steiner for preparing the manuscript. J.A. 
was supported by fellowships from the Royal Society and the 


NATURE VOL. 308 19 APRIL 1984 


OUTSIDE 
(LUMEN) 


Fig. 2 a, Distribution of polar side 
chains in the hydrophobic regions of 
the El sequence. Residues 26-47 
(1), 57-81 (2) and 82-106 (3) are 
plotted as a-helices and viewed end- 
on. Polar side chains are boxed: pro- 
posed hydrophilic faces of helices 2 
and 3 are indicated. b, Possible 
topologies of the Ei protein across 
the membrane. Arrows indicate sites 
accessible to protease; broken 
arrows represent inefficient pro- 
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